Overview

This analysis addresses three major knowledge gaps for ectomycorrhizal fungi (EMF) in Canada:

  • Eltonian Shortfall: Gap in knowledge about species interactions
  • Hutchinsonian Shortfall: Gap in knowledge about species’ abiotic niches
  • Wallacean Shortfall: Gap in knowledge about species distributions

The analysis combines tree species distribution data, mycorrhizal databases, and EMF sequence data to quantify these knowledge gaps across Canadian ecosystems. We examine EMF diversity at multiple taxonomic levels including sequence-based taxa (Other_ID and UNITE_ID), genera, and species.


Methods

Data Processing Pipeline

The analysis follows a structured pipeline implemented across multiple R scripts:

  1. Setup (01_setup.R): Environment configuration and library loading
  2. Data Download (02_download_data.R): Acquisition of external datasets
  3. Spatial Processing (03_process_spatial.R): Geographic data preparation
  4. Fungal Data Processing (04_process_fungal.R): EMF and mycorrhizal data integration
  5. Metrics Calculation (05_calculate_metrics.R): Richness and coverage analysis
  6. Visualization (06_create_maps.R): Map creation and spatial visualization

Data Sources

  • FungalRoot Database: Mycorrhizal associations (Soudzilovskaia et al., 2020)
  • Tree Range Maps: Little’s range maps from USGS
  • EMF Sequence Data: Provided by Stephanie Kivlin
  • Locations in Canada where EMF have been sampled: Provided by Stephanie Kivlin & from van Galen et al. (2025)
  • Ecoregions: National Ecological Framework for Canada
  • Administrative Boundaries: GADM database

Taxonomic Classification

EMF diversity is analyzed at four taxonomic levels:

  • Other_ID: Sequence hypothesis codes from Other database
  • UNITE_ID: Sequence hypothesis codes from UNITE database
  • Genus: Extracted from EM_Species assignments
  • Species: Lowest taxonomic assignment possible from EM_Species

Data Processing


Results

Summary Statistics

EMF Research Coverage in Canada

Summary statistics for EMF research coverage in Canada
Metric Value
Total EMF sequence records 6815
Total unique sampling locations (all data) 815
Unique locations with EMF data 367
Unique Other_ID values 1034
Unique UNITE_ID values 255
Unique EMF genera 139
Unique EMF species 807
Canadian EMF host tree species 99
Host species with sequence data 11
Percentage of host species with data 11.11%
Unique host genera with data 7
Percentage of host genera with data 21.21%
Total ecoregions 218
Ecoregions with potential EMF habitat 196
Ecoregions sampled for EMF 51
Percentage of habitat ecoregions unsampled 71.02%

Knowledge Gap Analysis

Analysis of major knowledge shortfalls in EMF research
Knowledge Shortfall Description Key Finding
Eltonian Shortfall Gap in knowledge about species interactions 11.11% of Canadian EMF host tree species have associated fungal sequence data; 807 unique EMF species identified
Hutchinsonian Shortfall Gap in knowledge about species’ abiotic niches 71.02% of habitat ecoregions remain unsampled
Wallacean Shortfall Gap in knowledge about species distributions 815 total unique sampling locations; 367 locations with EMF sequence data; 1034 unique Other_ID and 255 unique UNITE_ID sequence-based taxa

Comprehensive EMF Taxonomic Diversity

EMF taxonomic diversity and sampling intensity across different taxonomic levels
Taxonomic Level Unique Taxa Mean Locations/Taxon Max Locations/Taxon Host Species Mean Taxa/Host
Other_ID 1034 1.74 11 16 64.56
UNITE_ID 255 1.51 11 15 17.73
Genus 139 16.93 196 16 20.31
Species 807 5.85 94 16 53.88

Wallacean Shortfall: Sampling Intensity Analysis

Sampling intensity analysis showing distribution gaps (Wallacean Shortfall)
Taxonomic Level Unique Taxa Mean Locations Max Locations Min Locations
Other_ID 1034 1.74 11 1
UNITE_ID 255 1.51 11 1
Genus 139 16.93 196 1
Species 807 5.85 94 1

Eltonian Shortfall: Host-Taxon Associations

Host-taxon association analysis showing interaction gaps (Eltonian Shortfall)
Taxonomic Level Host Species EMF Taxa Mean Taxa/Host Max Taxa/Host Mean Hosts/Taxon
Other_ID 16 752 64.56 266 1.37
UNITE_ID 15 206 17.73 97 1.29
Genus 16 95 20.31 53 3.42
Species 16 477 53.88 184 1.81

Spatial Coverage Analysis

Spatial analysis of EMF coverage across 1° × 1° grid cells
Spatial Metric Value
Grid cells with species data 1600
Cells with zero EMF coverage 330
Cells with EMF data 1270
Mean EMF coverage proportion 0.229
Maximum EMF coverage 1
Range of coverage 0 - 1

Spatial Analysis and Visualization

Sampling Distribution

Distribution of EMF sampling locations across Canadian ecoregions. Points are colored by data source and aggregated within 1 km radius.

Distribution of EMF sampling locations across Canadian ecoregions. Points are colored by data source and aggregated within 1 km radius.

EMF Host and sampling coverage

Number of tree species within each 1° × 1° grid cell that have associated EMF sequence data somewhere in Canada.

Number of tree species within each 1° × 1° grid cell that have associated EMF sequence data somewhere in Canada.


Proportion of tree species per 1° × 1° grid cell that have associated EMF sequence data somewhere in Canada

Proportion of tree species per 1° × 1° grid cell that have associated EMF sequence data somewhere in Canada


Bivariate map showing both the number (richness) of EMF host tree species and the proportion of host species in the grid cell with EMF records somewhere in Canada.

Bivariate map showing both the number (richness) of EMF host tree species and the proportion of host species in the grid cell with EMF records somewhere in Canada.


Detailed Taxonomic Analysis

Most Well-Sampled Taxa

Top 10 most well-sampled EMF species
Species Unique Locations Total Records
Lachnum_virgineum 94 94
Cenococcum_geophilum 60 60
Cortinarius_decipiens 56 106
Tylospora_asterophora 54 89
Cortinarius_croceus 52 56
Geopyxis_carbonaria 50 51
Tomentella_stuposa 48 52
Tomentella 42 154
Cortinarius_casimiri 36 70
Mycena_aetites 35 35
Top 10 most well-sampled EMF genera
Genus Unique Locations Total Records
Cortinarius 196 1708
Tomentella 105 359
Lachnum 96 115
Tricholoma 94 251
Inocybe 88 420
Russula 81 361
Mycena 80 187
Hebeloma 63 164
Cenococcum 60 60
Peziza 58 72
Top 10 most well-sampled UNITE_ID taxa
UNITE_ID Unique Locations Total Records
SH1132005.09FU 11 26
SH1648320.08FU 10 14
SH1571570.08FU 9 21
SH0924970.09FU 8 8
SH1156627.09FU 8 13
SH1563787.08FU 7 28
SH1155880.09FU 6 6
SH1067874.09FU 5 7
SH1138802.09FU 5 8
SH1295836.09FU 5 5
Top 10 most well-sampled Other_ID taxa
Other_ID Unique Locations Total Records
UDB0780173 11 19
UDB027969 10 16
UDB018564 9 17
UDB034984 9 54
UDB0754284 9 16
UDB001746 8 16
UDB004960 8 9
UDB016650 8 8
UDB023537 8 16
UDB026053 8 17

Host Associations

Top 10 host tree species by number of associated EMF species
Host Species Unique EMF Species Total Records
Pseudotsuga_menziesii 184 381
Pinus_albicaulis 108 325
Picea_engelmannii 104 149
Populus_tremuloides 84 91
Tsuga_heterophylla 59 786
Pinus_contorta 55 258
Populus_sp 49 190
Pinus_banksiana 47 123
Salix_arctica 46 135
Dryas_integrifolia 37 105
Top 10 EMF species by number of associated host species
EMF Species Unique Hosts Total Records
Tomentella 13 119
Meliniomyces 10 101
Cortinarius_decipiens 9 62
Sebacina 9 33
Cortinarius 8 105
Sebacina_dimitica 8 29
Thelephora_terrestris 8 24
Tylospora_asterophora 8 40
Piloderma_sphaerosporum 7 55
Tomentella_cinereoumbrina 7 12

Knowledge Gaps

Analysis Summary

This comprehensive analysis reveals key knowledge shortfalls in EMF research across Canada:

Eltonian Shortfall (Species Interactions):

  • 11.11% of Canadian EMF host tree species have associated fungal sequence data
  • 1034 unique Other_ID sequence-based taxa identified
  • 255 unique UNITE_ID sequence-based taxa identified
  • 139 unique EMF genera documented
  • 807 unique EMF species documented
  • Host-taxon association matrices created for all taxonomic levels

Hutchinsonian Shortfall (Environmental Niches):

  • 71.02% of habitat ecoregions remain unsampled
  • 51 of 176 potential habitat ecoregions have been sampled

Wallacean Shortfall (Species Distributions):

  • 815 total unique sampling locations across Canada
  • 367 locations with detailed EMF sequence data
  • Sampling intensity varies dramatically among taxa at all taxonomic levels
  • Large geographic gaps exist in sampling coverage

Data Availability

All analysis outputs are available in the project’s outdata/ directory:

Key analysis output files and their availability
File Name Description Available
comprehensive_emf_summary.csv comprehensive_emf_summary.csv Summary statistics across all taxonomic levels TRUE
locations_per_other_id.csv locations_per_other_id.csv Sampling locations per Other_ID sequence-based taxon TRUE
locations_per_unite_id.csv locations_per_unite_id.csv Sampling locations per UNITE_ID sequence-based taxon TRUE
locations_per_genus.csv locations_per_genus.csv Sampling locations per EMF genus TRUE
locations_per_species.csv locations_per_species.csv Sampling locations per EMF species TRUE
host_taxon_matrix_species.csv host_taxon_matrix_species.csv Host-species association matrix TRUE
emf_taxa_per_host_species.csv emf_taxa_per_host_species.csv EMF species counts per host species TRUE
hosts_per_emf_taxon_species.csv hosts_per_emf_taxon_species.csv Host species counts per EMF species TRUE

Session Information

R session information for reproducibility
Component Details
R Version 4.4.2
Platform x86_64-apple-darwin20
Running under macOS Ventura 13.7.6
Locale en_US.UTF-8/en_CA.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

References

  • Soudzilovskaia, N.A., et al. (2020). FungalRoot: global online database of plant mycorrhizal associations. New Phytologist, 227, 955–966.
  • Van Galen, L.G., Corrales, A., Truong, C., Van Den Hoogen, J., Kumar, S., Manley, B.F., et al. (2025). The biogeography and conservation of Earth’s ‘dark’ ectomycorrhizal fungi. Current Biology, 35, R563–R574.
  • Little’s range maps via the USTreeAtlas repository
  • National Ecological Framework for Canada
  • GADM database (gadm.org)

Analysis completed on 2025-07-17
For questions about this analysis, contact the corresponding authors.